From Fundamentals to Advanced Concepts with Full Theory & Syntax
A string is a sequence of characters enclosed in quotes. Strings are one of the most commonly used data types in Python and are used to represent textual data such as names, messages, sentences, file paths, and more.
str_var = "Hello"
str_var = 'Hello'
str_var = """Hello World"""
Python allows strings to be created using single quotes ' ' double quotes " " or triple quotes ''' ''' or """ """. Triple-quoted strings support multi-line text.
String methods are built-in functions that operate on string objects. These methods return modified copies of strings since strings are immutable.
text = "Hello World"
text.upper()
text.lower()
text.title()
text.strip()
text.replace("World", "Python")
text.split()
These methods are essential for cleaning, formatting, and transforming text data.
String formatting allows embedding variables and expressions inside strings dynamically. This improves readability and reduces concatenation errors.
name = "Alice"
age = 25
# f-string (recommended)
message = f"My name is {name} and I am {age} years old."
# format() method
message = "My name is {} and I am {} years old.".format(name, age)
# %-formatting
message = "My name is %s and I am %d years old." % (name, age)
F-strings (introduced in Python 3.6) are the most readable and efficient method.
String operations allow combining, repeating, slicing, and manipulating strings using operators.
a = "Hello"
b = "World"
# Concatenation
result = a + " " + b
# Repetition
repeat = a * 3
These operations are widely used in text construction and formatting.
Strings can be compared using relational operators. Comparisons are based on Unicode values (lexicographical order).
"a" == "a"
"a" != "b"
"a" < "b"
"apple" > "banana"
This is useful for sorting, validation, and conditional logic.
String parsing is the process of extracting meaningful data from a string. It is widely used when working with logs, CSV files, or formatted text.
data = "Name:Alice,Age:25,City:London"
parts = data.split(",")
Parsing enables structured data extraction from unstructured text.
Regular expressions (regex) provide a powerful way to search, match, and manipulate text patterns.
import re
pattern = r"\d+"
text = "Age is 25"
match = re.search(pattern, text)
Regex is essential for validation, extraction, and advanced text processing.
String manipulation refers to modifying strings by replacing, trimming, splitting, joining, or formatting them.
text = " Hello World "
cleaned = text.strip().replace("World", "Python")
These operations are foundational for data preprocessing.
String interpolation is the process of embedding variables or expressions within a string dynamically.
name = "John"
score = 95
message = f"{name} scored {score}% in the exam."
It improves readability and reduces formatting errors.
String encoding converts a string into bytes using a specified character encoding such as UTF-8, ASCII, etc.
text = "Hello"
encoded = text.encode("utf-8")
decoded = encoded.decode("utf-8")
Encoding is crucial for file I/O, networking, and internationalization.
Escape characters allow insertion of special characters such as newline, tab, quotes, and backslashes within strings.
text = "Line1\nLine2\tTabbed"
quote = "She said, \"Hello\""
path = "C:\\Users\\Admin"
Escaping ensures proper interpretation of special characters.
Searching refers to finding the presence or position of a substring within a string.
text = "Hello World"
text.find("World")
text.index("World")
"Hello" in text
This is useful for validation, filtering, and pattern detection.
Replacement modifies parts of a string by substituting one substring with another.
text = "I like Java"
new_text = text.replace("Java", "Python")
Replacement is frequently used in data cleaning and formatting.
Case conversion changes the capitalization style of a string.
text = "hello world"
text.upper()
text.lower()
text.title()
text.capitalize()
text.swapcase()
Case normalization is critical for comparisons and user input processing.
Trimming removes unwanted whitespace from the beginning and/or end of strings.
text = " Hello "
text.strip()
text.lstrip()
text.rstrip()
This ensures clean and consistent data.
Splitting divides a string into a list of substrings based on a delimiter.
text = "apple,banana,orange"
fruits = text.split(",")
Splitting is commonly used when processing CSV and structured text.
Joining combines multiple strings into one using a specified separator.
words = ["Hello", "World"]
sentence = " ".join(words)
Joining is efficient and avoids repeated concatenation.
Concatenation combines two or more strings into one.
a = "Hello"
b = "World"
result = a + " " + b
Concatenation is fundamental to building dynamic text.
Repetition creates multiple copies of a string using the multiplication operator.
text = "Hi"
repeat = text * 3
This is useful for patterns, separators, and formatting.
Indexing accesses individual characters of a string using their position.
text = "Python"
text[0]
text[-1]
Indexing is zero-based and supports negative indexing.
Slicing extracts a portion of a string using start, stop, and step indices.
text = "Python Programming"
text[0:6]
text[7:]
text[::-1]
Slicing enables substring extraction and reversal.
Length returns the number of characters in a string.
text = "Hello"
length = len(text)
Length is used for validation, iteration, and layout control.
Iteration loops through each character in a string.
text = "Python"
for char in text:
print(char)
Iteration enables character-level processing.
Membership checks whether a substring exists within a string.
"text" in "This is a text"
"abc" not in "hello"
This is commonly used in conditional logic.
Strings are immutable, meaning their contents cannot be changed after creation. Any modification creates a new string.
text = "Hello"
# text[0] = "h" โ Not allowed
text = "h" + text[1:]
Immutability ensures safety and predictable behavior.
Validation checks whether a string meets certain conditions such as being numeric, alphabetic, uppercase, etc.
text = "12345"
text.isdigit()
text.isalpha()
text.isalnum()
text.islower()
text.isupper()
Validation is crucial for input handling and data quality.
Normalization ensures consistent representation of characters, especially for Unicode strings.
import unicodedata
text = "รฉ"
normalized = unicodedata.normalize("NFC", text)
Normalization avoids mismatches caused by different Unicode forms.
Tokenization splits text into meaningful units (tokens) such as words or sentences.
text = "Natural language processing"
tokens = text.split()
Tokenization is fundamental in NLP and text analysis.
Pattern matching identifies substrings that match specific patterns using regex.
import re
pattern = r"\b[A-Z][a-z]+\b"
text = "Alice went to London"
matches = re.findall(pattern, text)
Pattern matching enables advanced text extraction and validation.
Python strings are Unicode by default, allowing representation of characters from all languages.
text = "ใใใซใกใฏ"
print(text)
Unicode support enables globalized applications.
Formatting options control width, precision, alignment, and numeric formatting.
value = 12.34567
formatted = f"{value:.2f}"
These options ensure clean and professional output formatting.
Padding adds extra characters to a string to achieve a desired length.
text = "42"
text.zfill(5)
text.ljust(5, "*")
text.rjust(5, "*")
Padding is useful for alignment and formatting output.
Alignment controls how text is positioned within a given width.
text = "Hello"
text.center(10, "-")
text.ljust(10)
text.rjust(10)
Alignment is commonly used in tables and reports.
Whitespace handling removes or manages spaces, tabs, and newlines in strings.
text = " \t Hello \n "
cleaned = text.strip()
Whitespace handling is essential for data cleaning.
Error handling manages exceptions that occur during string operations.
text = "123"
try:
number = int(text)
except ValueError:
print("Conversion failed")
Error handling ensures robust and fault-tolerant programs.
Python may reuse small strings (string interning) to optimize memory usage.
a = "hello"
b = "hello"
print(a is b) # May return True
Raw strings treat backslashes as literal characters, useful in regex and file paths.
path = r"C:\Users\Admin\Documents"
Triple-quoted strings allow multi-line text.
text = """This is
a multi-line
string."""
Strings represent text, while bytes represent binary data.
text = "Hello"
byte_data = b"Hello"